---
title: LobeChat Model Config Guide
description: >-
  Explore the capabilities of ChatGPT models from gpt-3.5-turbo to gpt-4-32k, understanding their speed, context limits, and cost. Learn about model parameters like temperature and top-p for better output.

tags:
  - ChatGPT Models
  - Model Parameters
  - Neural Networks
  - Language Understanding
  - Generation Capabilities
---

# Model Guide

## ChatGPT

- **gpt-3.5-turbo**: Currently the fastest generating ChatGPT model, it is faster but may sacrifice some text quality, with a context length of 4k.
- **gpt-4**: ChatGPT 4.0 has improved language understanding and generation capabilities compared to 3.5. It can better understand context and generate more accurate and natural responses. This is thanks to improvements in the GPT-4 model, including better language modeling and deeper semantic understanding, but it may be slower than other models, with a context length of 8k.
- **gpt-4-32k**: Similar to gpt-4, the context limit is increased to 32k tokens, with a higher cost.

## Concept of Model Parameters

LLM seems magical, but it is essentially a probability problem. The neural network generates a bunch of candidate words from the pre-trained model based on the input text and selects the high-probability ones as output. Most of the related parameters are associated with sampling (i.e., how to select the output from the candidate words).

### `temperature`

This parameter controls the randomness of the model's output. The higher the value, the greater the randomness. Generally, when the same prompt is input multiple times, the model's output varies each time.

- Set to 0: Generates a fixed output for each prompt
- Lower values: More concentrated and deterministic output
- Higher values: More random output (more creative)

<Callout>
  Generally, the longer and clearer the prompt, the better the quality and confidence of the model's
  output. In such cases, the temperature value can be adjusted appropriately. Conversely, if the
  prompt is short and ambiguous, setting a relatively high temperature value will result in unstable
  model output.
</Callout>

<br />

### `top_p`

`top_p` is also a sampling parameter, but it differs from temperature in its sampling method. Before outputting, the model generates a bunch of tokens, and these tokens are ranked based on their quality. In the top-p sampling mode, the candidate word list is dynamic, and tokens are selected from the tokens based on a percentage. Top\_p introduces randomness in token selection, allowing other high-scoring tokens to have a chance of being selected, rather than always choosing the highest-scoring one.

<Callout>
  `top_p` is similar to randomness, and it is generally not recommended to change it together with
  the randomness of temperature.
</Callout>

<br />

### `presence_penalty`

The presence penalty parameter can be seen as a punishment for repetitive content in the generated text. When this parameter is set high, the generation model will try to avoid producing repeated words, phrases, or sentences. Conversely, if the presence penalty parameter is set low, the generated text may contain more repetitive content. By adjusting the value of the presence penalty parameter, control over the originality and diversity of the generated text can be achieved. The importance of this parameter is mainly reflected in the following aspects:

- Enhancing the originality and diversity of the generated text: In certain applications, such as creative writing or generating news headlines, it is necessary for the generated text to have high originality and diversity. By increasing the value of the presence penalty parameter, the amount of repeated content in the generated text can be effectively reduced, thereby enhancing its originality and diversity.
- Preventing the generation of loops and meaningless content: In some cases, the generation model may produce repetitive or meaningless text that usually fails to convey useful information. By appropriately increasing the value of the presence penalty parameter, the probability of generating such meaningless content can be reduced, thereby improving the readability and practicality of the generated text.

<Callout>
  It is worth noting that the presence penalty parameter, in conjunction with other parameters such
  as temperature and top-p, collectively influences the quality of the generated text. Compared to
  other parameters, the presence penalty parameter primarily focuses on the originality and
  repetitiveness of the text, while the temperature and top-p parameters more significantly affect
  the randomness and determinism of the generated text. By adjusting these parameters reasonably,
  comprehensive control over the quality of the generated text can be achieved.
</Callout>

### `frequency_penalty`

It is a mechanism that penalizes frequently occurring new vocabulary in the text to reduce the likelihood of the model repeating the same word. The larger the value, the more likely it is to reduce repeated words.

- `-2.0` When the morning news started broadcasting, I found that my TV now now now now now now now now now now now now now now now now now now now now now now now now now now now now now now now now now now now now now now now now now now now **(The highest frequency word is "now", accounting for 44.79%)**
- `-1.0` He always watches the news in the early morning, in front of the TV watch watch watch watch watch watch watch watch watch watch watch watch watch watch watch watch watch watch watch watch watch watch watch watch watch watch watch watch watch watch watch **(The highest frequency word is "watch", accounting for 57.69%)**
- `0.0` When the morning sun poured into the small diner, a tired postman appeared at the door, carrying a bag of letters in his hands. The owner warmly prepared a breakfast for him, and he started sorting the mail while enjoying his breakfast. **(The highest frequency word is "of", accounting for 8.45%)**
- `1.0` A girl in deep sleep was woken up by a warm ray of sunshine, she saw the first ray of morning light, surrounded by birdsong and flowers, everything was full of vitality. (The highest frequency word is "of", accounting for 5.45%)
- `2.0` Every morning, he would sit on the balcony to have breakfast. Under the soft setting sun, everything looked very peaceful. However, one day, when he was about to pick up his breakfast, an optimistic little bird flew by, bringing him a good mood for the day. (The highest frequency word is "of", accounting for 4.94%)

<br />

### `reasoning_effort`

The `reasoning_effort` parameter controls the strength of the reasoning process. This setting affects the depth of reasoning the model performs when generating a response. The available values are **`low`**, **`medium`**, and **`high`**, with the following meanings:

- **low**: Lower reasoning effort, resulting in faster response times. Suitable for scenarios where quick responses are needed, but it may sacrifice some reasoning accuracy.
- **medium** (default): Balances reasoning accuracy and response speed, suitable for most scenarios.
- **high**: Higher reasoning effort, producing more detailed and complex responses, but slower response times and greater token consumption.

By adjusting the `reasoning_effort` parameter, you can find an appropriate balance between response speed and reasoning depth based on your needs. For example, in conversational scenarios, if fast responses are a priority, you can choose low reasoning effort; if more complex analysis or reasoning is needed, you can opt for high reasoning effort.

<Callout>
  This parameter is only applicable to reasoning models, such as OpenAI's `o1`, `o1-mini`,
  `o3-mini`, etc.
</Callout>
